COMPONENT  PART  NOTICE 


This  paper  is  a COMPONENT  PART  of  the  following  COMPILATION  report: 

TITLE:  workshop  on  Assessing  Uncertainty  Held  in  Monterey,  California  on  November 

13-14,  1986. 

TO  ORDER  THE  COMPLETE  COMPILATION  REPORT/  USE  AD-A18Q  850 . 

The  COMPONENT  PART  is  provided  here  to  allow  users  access  to  individually 
AUTHORED  SECTIONS  OF  PROCEEDING,  ANNALS,  SYMPOSIA/  ETC.  HOWEVER,  THE  COMPONENT 
SHOULD  BE  CONSIDERED  WITHIN  THE  CONTEXT  OF  THE  OVERALL  COMPILATION  REPORT  AND 
NOT  AS  A STAND-ALONE  TECHNICAL  REPORT. 

The  following  COMPONENT  PART  numbers  comprise  the  COMPILATION  report: 

AD#:  P0Q5  291  fpQ  £ 3oH  AP#: 

AD#: AD#:_  

AD#:  : AD#:_  


AD-P005  295 


A BAYESIAN  VIEW  OF 
ASSESSING  UNCERTAINTY 
AND  COMPARING  EXPERT 
OPINION 


by 


Morris  H.  DeGroot 


Department  of  Statistics 
Carnegie  Mellon  University 
Pittsburgh,  Pennsylvania  15213 


Technical  Report  Number  387 
January,  1987 


Presented  at  the  Workshop  on  Assessing  Uncertainty,  Naval  Postgradu- 
ate School,  Monterey,  California,  November  13-14.  198(j.  This  research  was 
supported  in  part  by  the  National  Science  Foundation  under  grant  DMS- 
8320618. 


V’ 


C.V  VV  * AV  "A / ./• 


I 


Abstract 

A Bayesian  approach  to  the  problem  of  comparing  experts  or  expert  systems 
is  presented.  The  question  of  who  is  an  expert  is  considered  and  comparisons 
among  well-calibrated  experts  are  studied.  The  concept  of  refinement  , in  various 
equivalent  forms,  is  used  in  this  study.  An  informative  example  of  the  combination 
of  the  opinions  of  well-calibrated  experts  is  described.  Total  orderings  of  the  class 
of  well-calibrated  experts  are  derived  from  strictly  proper  scoring  rules. 

^ Kcywor<ds,and  p/iroscj.-^Predictions,'  forecasters'  well  calibrated,'  expert  sys- 


tems,' combining  opinion,'  scoring  rules. 


1 Introduction 

In  the  fields  of  artificial  intelligence  and  expert  systems,  the  necessity  of  assess- 
ing uncertainty  and  of  coping  with  that  uncertainty  by  developing  methods  for 
decision  making  under  uncertainty  are  now  widely  recognized.  In  this  paper.  1 
will  argue  in  favor  of  the  Bayesian  approach  to  assessing  uncertainty,  and  then 
describe  some  ways  in  which  this  approach  can  be  used  to  compare  experts  or 
expert  systems. 

The  argument  in  favor  of  the  Bayesian  approach  proceeds  in  two  steps:  (1) 
The  quantitative  assessment  of  uncertainty  is  in  itself  a sterile  exercise  unless  that 
assessment  is  to  be  used  to  make  decisions.  (2)  The  Bayesian  approach  provides 
the  only  coherent  methodology  for  decision  making  under  uncertainty  (see.  e.g.. 
Savage,  1954;  DeGroot,  1970;  or  Lindlev.  1987). 

The  Bayesian  approach  to  the  assessment  of  uncertainty  is  defined  to  be  the 
approach  in  which  any  uncertainty  about  the  values  of  various  quantities  on  the 
part  of  the  decision  maker  or  the  person  receiving  information  from  an  expert  or 
an  expert  system  is  represented  by  the  person's  subjective  joint  probability  dis- 
tribution for  those  values.  Indeed,  in  the  fields  of  artificial  intelligence  and  expert 
systems,  the  terms  “Bayesian  approach"'  and  “probability  approach"  are  often 
used  interchangeably.  This  usage  is  appropriate  because  the  Bayesian  approach  is 
not  characterized,  as  is  sometimes  stated,  by  the  repeated  use  of  Bayes"  Theorem. 
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but  by  the  ubiquitous  specification  of  probabilities  to  represent  uncertainty. 

Two  other  approaches  to  the  representation  of  uncertainty  in  expert  system;; 
that  have  been  widely  discussed  are  belief  functions  (Shafer,  1976.  1982.  1987)  and 

i 

fuzzy  logic  (Zadeh.  1979.  1983).  Both  of  these  approaches  can  provide  reasonable 
approximations  to  probability  under  special  conditions  when  it  is  not  necessary 
| for  a decision  maker  to  specify  a fully-detailed,  high-dimensional  joint  probability 

distribution  for  all  of  the  quantities  about  which  he  or  she  is  uncertain  in  order 
to  be  able  to  choose  an  effective  decision.  In  general,  however,  neither  of  these 
approaches  provides  a coherent  operational  meaning  in  all  decision  problems,  the 
way  probability  does. 

Belief  functions  are  closely  related  to  the  concept  of  upper  and  lower  probabil- 

j ities  (Dempster.  1967),  whereby  the  unique  probability  of  an  event  is  replaced  by 

j 

i 

an  upper  and  a lower  probability.  However,  has  always  seemed  to  me  to  be  a step 

in  the  wrong  direction  to  say  that  because  it  is  too  difficult  to  specify  a precise 

number  for  the  probability  of  some  event,  we  will  specify  two  precise  numbers. 

I 

There  is  little  doubt  that  ail  of  these  approaches  can  contribute  to  the  insights 
that  can  be  gained  from  a thorough  analysis  of  a particular  situation.  But.  unfor- 
tunately there  is  a tendency  on  the  part  of  people,  including  scientists,  to  view  the 
world  as  a dichotomy  comprising,  on  the  one  hand.’  the  group  to  which  they  be- 
long, and  on  the  other,  everyone  else.  Thus,  those  who  follow  a Bayesian  approach 
consider  the  world  to  be  divided  into  Bayesians  and  non-Bayesians.  I suppose  that 
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those  who  work  with  belief  functions  consider  the  world  to  be  divided  into  be- 
lievers and  nonbelievers.  It  is  a tribute  to  the  talent  and  charisma  of  Professor 
Zadeh  that  so  many  scientists  identify  with  a group  that  can  only  be  called  “fuzzy 
thinkers.'*  when  the  rest  of  the  world  must  be  “clear  thinkers."  It  is  from  such  a 
dichotomous  outlook  that  the  Bayesian  approach  is  adopted  here. 

In  this  paper  we  will  restrict  ourselves  to  problems  in  which  you  must  determine 
your  subjective  probability  of  some  event  R.  such  as  the  probability  that  it  will 
rain  tomorrow  in  some  particular  location,  or  the  probability  that  a particular 
patient  has  a certain  disorder.  It  is  assumed  that  you  can  consult  an  expert  or  an 
expert  system  to  guide  your  evaluation  of  this  probability.  Thus,  you  will  want 
to  combine  the  expert’s  prediction,  i.e.,  the  expert’s  probability  of  R.  with  your 


■MM  WMUWUUWWi’Wii'i  jiuwntjmmujauji  ww,ri 


i already  obtained  can  be  induced  by  means  of  the  concept  of  strictly  proper  scoring 

rules. 

i 

2 Who  is  an  expert? 

We  continue  to  consider  the  situation  in  which  you  must  determine  your  subjective 
probability  of  some  specific  future  event,  and  you  can  consult  an  expert  (or  an 
expert  system)  and  obtain  the  prediction,  i.e..  the  probability,  of  that  expert.  The 
question  arises  in  this  context  as  to  just  who  should  be  regarded  as  an  expert. 
Somewhat  surprisingly,  most  articles  regarding  the  evaluation,  comparison,  or 
combination  of  expert  opinion,  including  my  own  articles,  do  not  consider  this 
question  at  all.  Some  exceptions  to  this  silence  are  Morris  ( 1974 ).  who  states  that 
“We  shall  refer  to  ...a  person  who  provides  a judgment  concerning  uncertain 
matters  as  an  expert,"  and  Morris  (1977),  who  defines  an  expert  “to  mean  anyone 
with  special  knowledge  about  an  uncertain  quantity  or  event."  Schervish  (1984) 
writes,  “. . . we  understand  the  word  expert  in  a very  loose  sense.  We  will  assume  .V 
is  an  unknown  quantity  of  interest  , and  we  will  call  an  expert  anyone  who  is  willing 
and  able  to  state  some  aspect  of  their  subjective  distribution  for  AY*  Winkler 
(1986)  describes  a “notion  of  goodness”  of  a probability  appraiser  which  he  calls 
“expertise”  and  which  “relates  to  the  degree  to  which  the  probability  appraiser 
i can  approach  perfect  forecasts."  This  concept  of  expertise  is  closely  related  to  the 
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concepts  of  calibration  and  refinement  to  be  discussed  in  the  subsequent  sections 


of  this  paper. 


Two  extreme  definitions  of  an  expert  seem  possible.  At  one  extreme,  in  the 


spirit  of  the  authors  just  mentioned,  we  could  define  an  expert  to  be  anyone  or 


any  system  that  will  give  you  a prediction. 


At  the  other  extreme,  in  this  paper  we  will  define  an  expert  to  be  someone 


whose  prediction  you  will  simply  adopt  as  your  own  posterior  probability  without 


modification.  This  will  be  the  case  if  you  believe  that  the  expert  has  all  of  the 


information  that  vou  have  that  mav  be  relevant  to  the  occurrence  or  nonoccurrer'-e 


of  the  event,  and  possibly  additional  information  as  well,  and  you  believe  that  the 
expert  processes  all  of  this  information  in  the  way  that  you  would  process  it  if  you 


had  the  information  and  the  proper  technical  training.  Of  course,  one  way  to  be 


certain  that  the  expert  or  expert  system  has  all  of  the  information  that  you  have 


is  to  tell  it  everything  that  you  know  that  is  relevant. 

This  definition  seems  satisfactory  if  you  are  dealing  with  just  a single  adviser, 
but  it  raises  conceptual  difficulties  if  two  advisers  are  present.  You  might  very 
well  be  willing  to  adopt  the  prediction  of  either  adviser  as  vour  own  posterior 
probability  if  that  was  the  only  prediction  available  to  you.  However,  after  you 
have  learned  the  prediction  of  the  first  adviser,  you  may  no  longer  regard  the  sec- 
ond adviser  as  an  expert  according  to  this  definition  because,  rather  than  simply 
accepting  the  second  adviser's  prediction,  you  would  typically  want  to  combine  it 
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with  the  first  adviser’s  prediction  in  some  way  to  develop  your  own  overall  pos- 
terior probability.  Nevertheless,  in  the  presence  of  just  a single  advisory  system, 
we  can  say  in  accordance  with  this  definition  that  you  have  succeeded  in  building, 
an  expert  system  for  t/ourse/f  if  you  will  accept  its  prediction  iii  each  case  that  it 
might  handle. 

In  some  of  the  literature  (see,  e.g..  DeGroot  and  Eriksson,  1985)  an  expert  or 
an  expert  system  is  said  to  he  well  o:hbmtr>l  if  you  will  adopt  its  prediction  as  your 
own  posterior  probability.  Based  on  the  discussion  that  has  just  been  given  here, 
it  would  be  unnecessary  to  use  the  term  “well  calibrated"  in  this  paper  because 
that  property  is  now'  simply  the  defining  characteristic  of  an  expert.  Nevertheless, 
for  the  slight  cost  of  being  redundant  and  the  great  gain  of  being  clear  about  the 
relationship  of  this  paper  to  other  work  on  the  same  subject,  we  will  use  the  term 
“well-calibrated  expert”  to  denote  an  expert  or  a system  of  this  type. 

3 Comparing  well-calibrated  experts 

Well-calibrated  experts  can  exhibit  a wide  variety  of  different  types  of  predictive 
behavior.  Let  X denote  the  prediction  that  a particular  weil-calibrated  expert  will 
make  in  a given  situation.  In  other  words,  X is  the  probability  that  the  expert 
will  state  for  the  occurrence  of  the  event  R being  predicted.  Before  you  learn  the 
prediction  of  the  expert,  A”  is  a random  variable  since  you  are  not  certain  what 
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the  expert’s  prediction  will  be. 


At  one  extreme  in  the  class  of  well-calibrated  experts  is  the  perfect  forecaster 


who  makes  only  the  predictions  X = 0 and  X - 1 and  who  you  know  is  always 


correct.  In  other  words,  this  expert  simply  states  with  certainty,  and  without 


error,  whether  or  not  the  event  R will  occur.  Suppose  that  your  prior  probability 


of  R is  /(  and  let  p'  denote  your  posterior  probability  of  R after  learning  this 


expert's  prediction.  Then  //'  will  be  either  0 or  1.  Since  £"(//')  = //.  where  the 


expectation  is  taken  with  respect  to  your  prior  distribution  for  //'.  it  follows  that 


you  must  assign  probability  /»  to  the  possibility  that  the  expert's  prediction  will 


be  A'  = 1.  and  probability  1 — /t  to  the  possibility  that  X = 0. 


At  the  opposite  extreme  in  the  class  of  well-calibrated  experts  is  the  useless. 


forecaster  whose  prediction  you  know  will  be  X = //.  In  other  words,  you  know 


that  this  expert  is  simply  going  to  repeat  your  own  prior  probability  back  to  you. 


This  situation  arises  when  you  regard  yourself  as  vour  own  expert  or  your  own 


expert  system. 


The  basic  question  that  we  will  now  discuss  is  how  to  compare  other  well- 


calibrated  experts  whose  predictive  behavior  lies  somewhere  between  the  two  ex- 


tremes that  have  just  been  described.  Much  of  the  discussion  to  be  presented  is 


based  on  the  material  in  DeOroot  and  Fienberg  ( 1982.  1983.  1986)  and  DeGroot 


and  Eriksson  (1985),  and  further  details,  proofs,  and  derivations  of  the  results  can 


be  found  in  those  references. 


• « 
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In  the  approach  to  be  followed  here,  each  well-calibrated  expert  is  characterized 
by  your  probability  distribution  for  the  expert's  prediction  X.  For  simplicity,  we 
will  assume  that  X is  rest  ricted  to  lie  in  a given  finite  subset  A’  of  the  closed  unit 
interval  [0.  1].  In  effect,  we  are  assuming  that  the  expert's  probability  of  R is 
always  stated  to  just  a fixed  number  of  decimal  places.  As  one  example,  we  are 
all  familiar  with  the  fact  that  weather  forecasters  on  American  television  always 
state  their  probability  of  precipitation  to  just  a single  decimal  place.  Hence,  each 
expert  can  be  characterized  by  the  discrete  probability  function  (p.f.)  i'[.r ) of  his 
or  her  prediction  A*. 


If  the  expert  reports  X = t , then  vour  posterior  probability  of  R will  be  r. 
Hence,  if  your  prior  probability  of  R is  /«.  then  as  we  have  already  indicated. 
E(X)  = ft.  Thus,  the  comparison  of  all  well-calibrated  experts  reduces  to  the 
comparison  of  all  probability  distributions  on  the  set  A'  with  mean  ft. 

Intuitively,  it  should  be  clear  that  the  best  experts  are  those  about  whose 
predictions  you  are  most  uncertain;  i.e.,  whose  predictions  are  most  variable.  If 


you  are  fairly  certain  in  advance  what  prediction  the  expert  will  make  — i.e..  if 
the  p.f.  i/(x)  is  tightly  concentrated  around  its  mean  ft  - then  there  is  little  gain 
in  consulting  the  expert.  In  the  next  section  we  shall  make  this  notion  rigorous. 


«Viv  wn/vwm?W'Jtfl«W/WWf\a/UtnwwianAn  vmnNuvuvDviHnwi  wiwisnK- 


WWlJVlWViuiKnKnjinjinwi^riJwiiiniviwuwuwAii 


4 Refinement 


One  well-calibrated  expert  .4  is  said  to  be  at  least  as  rrfinrH  as  another  well- 
calibrated  expert  B if  we  can  simulate  expert  B's  prediction  from  expert  ,4‘s 
prediction  and  an  auxiliary  randomization.  That  is.  we  can  simulate  B s predic- 
tion by  passing  .4's  prediction  through  a noisy  channel.  Note  that  this  does  not 
mean  that  we  can  reproduce  B's  actual  prediction  from  knowing  .4*s  prediction, 
but  rather  that  we  can  generate  a prediction  that  has  the  same  stochastic  prop- 
erties as  B's  prediction.  The  technical  definition  of  this  concept  is  based  on  the 
following  notion  of  stochastic  transformations: 

A stochastic  transformation  h{y  | t)  is  a nonnegative  function  defined  on 
X :<  X such  that 

^Th{y  | .r ) = 1 for  every  rcX  14.1) 

y«  X 

If  the  experts  .4  and  B are  characterized  by  the  p.f.'s  ;/4(. r)  and  i/g(.r).  then  .4  is 
defined  to  be  at  least  as  refined  as  B if  there  exists  a stochastic  transformation 
h(y  \ t)  such  that 

h(y  \ x)va(t)  = l'siy)  for  y(X  , 1 4.2 ) 

Tf  X 

Y h{y  | t)xua(t)  = yi'siy)  for  ycX  . (4.3) 

Tf  X 

The  comparison  of  experts  in  terms  of  the  concept  of  refinement  is  very  strong. 
In  fact  it  can  be  shown  that  if  .4  is  at  least  as  refined  as  B and  you  are  given  a 
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choice  between  learning  the  prediction  of  .1  or  the  prediction  of  B.  you  will  prefer 
to  learn  that  of  ,4,  regardless  of  the  decision  problem  in  which  the  prediction  will 
be  used;  i.e.,  regardless  of  your  utility  function.  The  price  that  must  be  paid  for 
using  this  strong  method  of  comparison  is  that  not  all  experts  will  be  comparable. 
In  other  words,  the  concept  of  refinement  introduces  only  a partial  ordering  in  the 
class  of  p.f.’s  //(x)  with  mean  //. 

It  is  easy  io  verify  that  the  perfect  forecaster  described  in  Section  2 is  at 
least  as  refined  as  any  other  well-calibrated  expert,  and  that  every  well-calibrated 
expert  is  at  least  as  refined  as  the  useless  forecaster  described  in  that  section. 

We  shall  now'  describe  several  conditions  that  are  equivalent  to  the  proposition 
that  .4  is  at  least  as  refined  as  B.  Each  of  these  equivalent  conditions  makes  it 
possible  to  determine  whether  or  not  .4  is  at  least  as  refined  as  B without  having 
to  attempt  to  construct  a stochastic  transformation  h that  satisfies  the  definition 
(4.2)  and  (4.3). 

The  theory  of  refinement  is  essentially  a reformulation  of  the  theory  of  the 
comparison  of  statistical  experiments  as  developed  by  Blackwell  (1951.  1953). 
and  from  that  development  we  can  obtain  further  characterizations  of  the  desired 
type.  For  any  well-calibrated  expert,  let  F denote  the  distribution  function  (d.f. ) 
corresponding  to  the  p.f.  v\  i.e.,  let 


F(t)  = ;/( .r ) for  0 < t < 1 


(4.4| 
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Now  consider  two  arbitrary  well-calibrated  experts  .4  and  B , and  let  F A and 


Fb  denote  their  d.f/s.  The  following  result  is  analogous  to  Theorem  12.4.1  in 
Blackwell  and  Girshick  (1954). 

Theorem  1.  Expert  ,4  is  at  least  as  refined  as  expert  B if  and  only  if 

[ FA(.r)d.r  ^ [ Fb(  t )</.r  14.5) 

Jo  Jo 

for  all  values  of  / in  the  interval  0 < t < 1. 

The  relationship  (4.5)  between  the  d.f.'s  FA  and  FB  is  known  as  sicnnd-dnjn  * 
stochastic  dominance  (see.  e.g.,  Fishburn  and  Yickson.  1978). 

Now  let  xo  < < ...  < x*.  denote  the  finite  number  of  points  in  the  set  X. 

The  following  equivalent  condition  can  be  derived  from  (4.5): 

Theorem  2.  Expert  .4  is  at  least  as  refined  as  expert  B if  and  only  if 
1 

£jx;  - xjji/^x,)  - /'b(x,)|  £ 0 for  ; = 1 k-  - 1 (4.6) 

i=0 

Another  equivalent  condition  can  be  presented  in  terms  of  the  Lorenz  curve, 
which  is  defined  as  follows  (see,  e.g..  Gastwirth,  1971): 

Suppose  that  F is  the  d.f.  of  an  arbitrary  non-negative  random  variable  and. 
for  0 < /<  < 1,  define 

F~l{  u ) = inf{t  : F(  t ) > »}  . (4.7) 

The  function  F~]  is  called  the  quantile  function  corresponding  to  the  d.f.  F.  If  // 
again  represents  the  mean  of  the  distribution  with  d.f.  F.  then  the  Lorenz  curve 


11 


t v-j  iru  «rv  irw  ru  mx  wvt  tv  jt»i  ■ v arw* »-» j*  _j  xu 


£(/)  corresponding  to  the  d.f.  F is  given  by 

1(f)  = / for  0 < f i 1 (4.S) 

//  Jo 

For  any  d.f.  F,  the  Lorenz  curve  L(t)  is  a convex,  nondecreasing  function  on 
the  interval  0 < f < 1 such  that  1(0)  = 0 and  I(  1 ) = 1.  When  F is  the  d.f.  of  a 
discrete  distribution  concentrated  on  just  a finite  number  of  points,  as  is  true  of 
all  the  d.f.  s we  are  considering  in  this  paper,  then  L(t ) is  also  piecewise  linear. 

Now  consider  again  two  well-calibrated  experts  .4  and  B.  and  let  L 4 and  ZB 
denote  the  Lorenz  curves  corresponding  to  their  d.f.'s  F4  and  FB. 

Theorem  3.  Expert  .4  is  at  least  as  refined  as  expert  B if  and  only  if 

LA(t)  < LB(i ) for  all  0 < t < 1 . (4.!l) 

The  next  two  equivalent  conditions  that  will  be  presented  give  additional  in- 
sight into  the  relationship  of  refinement,  but  do  not  provide  a direct  way  of  veri- 
fying that  this  relationship  holds. 

Theorem  4-  Expert  .4  is  at  least  as  refined  as  expert  B if  and  only  if 

^(rhld^jIrWd  (4.10) 

x«/¥  xtX 

for  every  convex  function  g on  the  interval  [0.  It. 

Theorem  5.  Expert  .4  is  at  least  as  refined  as  expert  B if  and  only  if  there 
exists  a stochastic  transformation  g(x  j y)  such  that 


Y K*  I y)t'B(y)  = for  xe.V  . 


(4.11) 


(4.12) 


£ TT](T  I y)  - y for  ye.V 

TiX 

Theorem  5 is  interesting  because  it  shows  that  although  the  definition  of  .4 
being  more  refined  than  B depends  on  the  existence  of  a stochastic  transformation 
from  .r  to  y satisfying  certain  properties,  there  is  an  equivalent  condition  in  terms 
of  a stochastic  transformation  from  y to  ,r  satisfying  certain  oiher  properties. 

Results  of  the  type  that  have  been  presented  here  are  closely  related  to  the 
theory  of  inajorization.  as  described,  for  example  by  Marshall  and  Olkin  (1979). 
Indeed,  one  final  equivalent  way  of  saying  that  .4  is  at  least  as  refined  as  B is  to 
say  that  the  p.f.  vA  majorizes  the  p.f.  i'g. 

5 Two  experts 

As  we  have  stated,  if  expert  .4  is  at  least  as  refined  as  expert  B and  you  are 
given  a choice  between  learning  either  the  prediction  of  expert  .4  or  the  prediction 
of  expert  B (at  the  same  cost),  then  you  will  always  prefer  to  learn  that  of  .4. 
regardless  of  the  use  you  are  going  to  make  of  the  prediction.  However,  it  should 
also  be  emphasized  that  if  you  can  learn  the  prediction  of  expert  B in  addition 
to  the  opinion  of  expert  .4,  then  that  additional  information  will  often  be  useful 
in  the  sense  that  it  will  further  modify  your  posterior  probability  of  /?.  This  is 
possible  because  the  relationship  that  .4  is  at  least  as  refined  as  B depends  only 
on  the  marginal  p.f.’s  uA  and  ug  of  each  expert.  When  we  consider  the  joint  p.f. 
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of  their  predictions,  and  the  conditional  probability  of  R given  hot h prediction*. 

the  situation  can  change  drastically,  as  the  following  simple  example  shows. 

Let  .V  and  Y denote  the  predictions  of  experts  .4  and  P.  respectively,  and 

1 3 

suppose  that  both  .V  and  1 can  have  only  the  two  possible  values  - and 
Suppose  also  that  the  joint  distribution  of  A’  and  V is  as  follows: 


H 

v = i-r  = j) 

1 

16  ' 

M- 

v-i.r-i) 

- Mv  = ly- 

■i) 

3 

~ 16  ‘ 

(6.1 

Pr(: 

v-2.1-.*) 

4 4 ' 

Its  ‘ 

and  that 

Pr  (/?  | . 

Y4H) 

= Pr(p|  .Y  = ?,r 

_ 3 
“ 4; 

) = i ■ 

Pr  (tf  | 

*4r-l) 

| = Pr  (p  .V  = l.Y 

_ 1 
~ 4 

)=0  . 

(=»..> 

We  will  now  show  both  expert  .4  and  expert  B are  well  calibrated: 


and 
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Together,  (5.3)  and  (5.4)  show  that  expert  A is  well  calibrated  since  the  posterior 
probability  of  R given  .4's  prediction  X — .r  is  simply  .r  itself.  The  analogous 
calculation  shows  that 

“d  Pr(R\y  = \)  = \ ■ 
which  proves  that  expert  B is  also  well  calibrated. 

Hence,  if  you  learn  either  the  prediction  of  expert  .4  or  of  expert  B.  but  not 
both,  that  prediction  will  become  your  posterior  probability  of  R.  On  the  other 
hand,  if  you  could  learn  the  predictions  of  both  experts  .4  and  B.  then  (5.2) 
reveals  that  you  would  be  certain  whether  or  not  R will  occur.  In  summary,  in 
this  example  the  combination  of  two  relatively  imprecise  well-calibrated  experts 
can  be  completely  informative  as  to  whether  R will  occur. 


6 Ordering  all  experts 


As  we  have  discussed,  the  relation  of  refinement  induces  only  a partial  ordering  of 


the  class  of  well-calibrated  experts.  It  is  natural  to  try  to  obtain  a total  ordering  of 
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this  class,  and  one  way  to  accomplish  this  ordering  is  to  assign  a numerical  measure 
of  quality  to  the  experts.  Thus,  we  wish  to  assign  a value  m{i> ) to  each  p.f.  i< 
defined  on  the  set  ,Y  and  having  mean  //.  The  values  m[v)  should  be  assigned 
in  such  a way  that  the  “better”  experts  receive  the  larger  values.  We  interpret 
this  requirement  to  mean  that  if  expert  .4  is  at  least  as  refined  as  expert  B.  then 
n,(,/yt)  > with  strict  inequality  unless  the  p.f.’s  and  t'g  are  identical.  (A 

function  m with  this  property  is  called  Schur-convex;  see.  for  example.  Marshall 
and  Olkin.  1979.  or  DeGroot  and  Eriksson.  1985). 

One  way  to  develop  appropriate  measures  of  quality  is  to  invoke  the  concept 
of  slnctly  proper  scoring  rules  (see,  e.g.,  Stael  von  Holstein.  1970;  Savage.  1971: 
Winkler.  1977  and  1986;  or  DeGroot  and  Fienberg,  1983).  Suppose  that  if  an 
expert’s  pWiction  is  x and  the  event  R actually  occurs,  the  expert  will  rereive  a 
score  9i(x);  whereas  if  R does  not  occur,  the  expert  will  receive  a score  ( .r ) . We 
assume  that  the  expert  desires  to  maximize  his  or  her  score,  so  we  will  assume 
that  <7i(x)  is  an  increasing  function  of  x and  that  <7;(x)  is  a decreasing  function  of 
x.  Together,  the  pair  of  functions  (<7i,<7j)  is  said  to  form  a scoring  rule. 

Consider  now  the  possibility  that  although  an  expert's  actual  subjective  prob- 
ability of  R is  p,  the  prediction  that  the  expert  reports  is  x.  where  x is  not  neces- 
sarily equal  to  p.  (This  possibility  clearly  exists  for  a human  expert,  althouch  it 
may  not  exist  if  the  expert  is  actually  an  expert  system,  i.e..  a computer  program. ) 
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Under  these  conditions,  the  expert's  expected  score  is 


P9\{r)  + (i  - • (6-1 ) 

The  scoring  rule  (gj.g?)  is  said  to  he  strictly  proper  if  t = p is  the  unique  value  ; 

I 

of  t that  maximizes  (6.1).  “ 

l 

I 

The  idea  behind  strictly  proper  scoring  rules  is  that  they  are  supposed  to  ; 

encourage  the  expert  to  report  an  “honest"  prediction  because  only  such  a report 
maximizes  the  expert's  expected  score.  Of  course,  for  this  idea  to  be  effective, 
one  must  somehow  motivate  the  expert  to  want  to  maximize  his  or  her  expected  j 

score.  Nevertheless,  strictly  proper  scoring  rules  are  precisely  the  appropriate 

I 

I 

class  of  scoring  rules  that  should  be  considered  in  order  to  obtain  measures  of 
quality  m having  the  property  that  we  desire. 

Suppose  therefore  that  (<7,. <7;)  is  a strictly  proper  scoring  rule,  and  let 

g{T)  = •rgiU)  + (1  - r)g7(r)  . (6.2) 

Then  it  can  be  shown  (Savage,  1971)  that  (7(7)  must  be  a strictly  convex  function 
on  the  interval  0 < ,r  < 1.  Now  let  the  measure  of  quality  111  be  defined  for  any 
p.f.  u by  the  relation 

n?(;/)  = V g{.r  )t'(x)  (6.3) 

TtX 

In  other  words,  the  measure  of  quality  »?»(//)  that  you  assign  to  a well-calibrated 
expert  who  is  characterized  by  the  p.f.  u is  simply  your  expectation  of  the  score 
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that  the  expert  will  receive,  before  you  learn  the  expert's  prediction  A".  The  next 


result  now  follows  from  Theorem  4 and  the  extra  consideration  that  g is  not  only 
convex,  but  strictly  convex. 

Theorem  6.  If  expert  .4  is  at  least  as  refined  as  expert  B.  then  m{t'A  ) > ruff's), 
with  strict  inequality  unless  ( x ) = i's(r)  for  all  reX. 

In  summary,  each  choice  of  a strictly  proper  scoring  rule  leads  to  a (strictly) 
Sclmr-convex  measure  of  quality  m.  by  means  of  the  construction  (6.2)  and  (6.3). 

The  two  most  widely  known  strictly  proper  scoring  rules  for  the  evaluation  of 
forecasters  are  the  Brier  scoring  rule  (Brier,  1950).  defined  by  the  relations 

5i(.r)  = -(r  - l)2  . <7;( t ) = — .r"  . (6.4) 

and  the  logarithmic  scoring  rule  (Good.  1952).  defined  by  the  relations 

3,(x)  = log.r,  gz(x)  = log(  1 - r)  . (6.5) 

Others  are  described  in  the  references  already  cited  in  this  paper. 
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